Back

IEEE Journal of Biomedical and Health Informatics

Institute of Electrical and Electronics Engineers (IEEE)

All preprints, ranked by how well they match IEEE Journal of Biomedical and Health Informatics's content profile, based on 14 papers previously published here. The average preprint has a 0.10% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.

1
A deep learning approach to identify seizure-prone and normal patients from their EEG records

Basu, S.; Campbell, R. H.

2022-06-16 health informatics 10.1101/2022.06.15.22276461
#1
114× avg
Show abstract

Various learning models distinguish between an electroencephalogram (EEG) record of a normal patient and one having a seizure. In this paper, we propose a deep-learning based short-term memory (LSTM) model to identify whether an EEG record belongs to a seizure-prone patient with a non-seizure record or to a normal patient. The study builds on two datasets, namely the TUH Abnormal EEG Corpus (TUAB) and the TUH EEG Seizure Corpus (TUSZ) including the classified EEG records for seizure-prone and normal patients. We conducted experiments on both imbalanced and balanced datasets and show results using an LSTM model. We observed that the model performs consistently in both balanced and imbalanced cases using only 5 seconds of EEG data from the patient records. We show that our proposed LSTM model gives test accuracies up to 99.84% in case of 2-class classification between the non-seizure and normal classes and up to 98.87% in case of 3-class classification among non-seizure, seizure, and normal classes. This provides a basis for making improved temporal predictions about the occurrences of seizures.

2
Adaptive, Unlabeled and Real-time Approximate-Learning Platform (AURA) for Personalized Epileptic Seizure Forecasting

Yang, Y.; Truong, N. D.; Eshraghian, J. K.; Nikpour, A.; Kavehei, O.

2021-10-02 health informatics 10.1101/2021.09.30.21264287
#1
113× avg
Show abstract

A high performance event detection system is all you need for some predictive studies. Here, we present AURA: an Adaptive forecasting model trained with Unlabeled, Real-time data using internally generated Approximate labels on-the-fly. By harnessing the correlated nature of time-series data, a pair of detection and prediction models are coupled together such that the detection model generates labels automatically, which are then used to train the prediction model. AURA relies on several simple principles and assumptions: (i) the performance of an event prediction/forecasting model in the target application remains below the performance of an event detection model, (ii) detected events are treated as weak labels and deemed reliable enough for online training of a predictive model, and (iii) system performance and/or system responsive feedback characteristics can be tuned for a subject-under-test. For example, in medical patient monitoring, this enables personalizing forecasting models. Seizure prediction is identified as an ideal test case of AURA, as pre-ictal brainwaves are patient-specific and tailoring models to individual patients can significantly improve forecasting performance. AURA is used to generate an individual forecasting model for 10 patients, showing an average relative improvement in sensitivity by 14.30% and reduction in false alarms by 19.61%. This paper presents a proof-of-concept for the feasibility of online transfer-learning on a stream of time-series neurophysiological data that pave the way towards a low-power neuromorphic neuromodulation system.

3
Accurate prediction of neurologic changes in critically ill infants using pose AI

Gleason, A.; Richter, F.; Beller, N.; Arivazhagan, N.; Feng, R.; Holmes, E.; Glicksberg, B. S.; Morton, S. U.; La Vega-Talbott, M.; Fields, M.; Guttmann, K.; Nadkarni, G. N.; Richter, F.

2024-04-19 pediatrics 10.1101/2024.04.17.24305953
#1
106× avg
Show abstract

Infant alertness and neurologic changes can reflect life-threatening pathology but are assessed by exam, which can be intermittent and subjective. Reliable, continuous methods are needed. We hypothesized that our computer vision method to track movement, pose AI, could predict neurologic changes in the neonatal intensive care unit (NICU). We collected 4,705 hours of video linked to electroencephalograms (EEG) from 115 infants. We trained a deep learning pose algorithm that accurately predicted anatomic landmarks in three evaluation sets (ROC-AUCs 0.83-0.94), showing feasibility of applying pose AI in an ICU. We then trained classifiers on landmarks from pose AI and observed high performance for sedation (ROC-AUCs 0.87-0.91) and cerebral dysfunction (ROC-AUCs 0.76-0.91), demonstrating that an EEG diagnosis can be predicted from video data alone. Taken together, deep learning with pose AI may offer a scalable, minimally invasive method for neuro-telemetry in the NICU.

4
Epileptic Seizure Detection based on Different Events with XAI and Early Aid System for Patient Aid

Paneru, B.

2025-10-05 health informatics 10.1101/2025.10.03.25337233
#1
105× avg
Show abstract

The goal of this study is seizure detection in four class datasets for different seizure stages in epileptic patients. An early notification system is created to simulate the behavior of the patient experiencing a seizure receiving emergency assistance from caregivers, and a dataset acquired from Mendeley is used to train different models. The proposed EEGNet-ET-XGB models (EEGBoostNet) effectiveness against hybrid deep learning models is demonstrated by the outcome of 95.88% and 94.41% mean accuracy on stratified cross validation. On the full dataset, Bi-GRU with attention, bidirectional LSTM-GRU models, and conventional ensemble techniques like XGBoost can all do remarkably well. Channel 9 data is the most important feature, according to the SHAP interpretability analysis, which is conducted on several models with the aid of SHAP plots. The IoT-BCI cloud modeling is adapted to make early notifications for emergency systems. This method is essential for categorizing different seizure types according to occurrences in order to provide early warning and for developing a home automation strategy that will help victims.

5
CovidEnvelope: A Fast Automated Approach to Diagnose COVID-19 from Cough Signals

Hossain, M. Z.; Uddin, M. B.; Ahmed, K. A.

2021-04-20 health informatics 10.1101/2021.04.16.21255630
#1
100× avg
Show abstract

The COVID-19 pandemic has a devastating impact on the health and well-being of global population. Cough audio signals classification showed potential as a screening approach for diagnosing people, infected with COVID-19. Recent approaches need costly deep learning algorithms or sophisticated methods to extract informative features from cough audio signals. In this paper, we propose a low-cost envelope approach, called CovidEnvelope, which can classify COVID-19 positive and negative cases from raw data by avoiding above disadvantages. This automated approach can pre-process cough audio signals by filter-out back-ground noises, generate an envelope around the audio signal, and finally provide outcomes by computing area enclosed by the envelope. It has been seen that reliable datasets are also important for achieving high performance. Our approach proves that human verbal confirmation is not a reliable source of information. Finally, the approach reaches highest sensitivity, specificity, accuracy, and AUC of 0.92, 0.87, 0.89, and 0.89 respectively. The automatic approach only takes 1.8 to 3.9 minutes to compute these performances. Overall, this approach is fast and sensitive to diagnose the people living with COVID-19, regardless of having COVID-19 related symptoms or not, and thus have vast applicability in human well-being by designing HCI devices incorporating this approach.

6
BioSignal Copilot: Leveraging the power of LLMs in drafting reports for biomedical signals

Liu, C.; Ma, Y.; Kothur, K.; Nikpour, A.; Kavehei, O.

2023-07-06 health informatics 10.1101/2023.06.28.23291916
#1
97× avg
Show abstract

Recent advances in Large Language Models (LLMs) have shown great potential in various domains, particularly in processing text-based data. However, their applicability to biomedical time-series signals (e.g. electrograms) remains largely unexplored due to the lack of a signal-to-text (sequence) engine to harness the power of LLMs. The application of biosignals has been growing due to the improvements in the reliability, noise and performance of front-end sensing, and back-end signal processing, despite lowering the number of sensing components (e.g. electrodes) needed for effective and long-term use (e.g. in wearable or implantable devices). One of the most reliable techniques used in clinical settings is producing a technical/clinical report on the quality and features of collected data and using that alongside a set of auxiliary or complementary data (e.g. imaging, blood tests, medical records). This work addresses the missing puzzle in implementing conversational artificial intelligence (AI), a reliable, technical and clinically relevant signal-to-text (Sig2Txt) engine. While medical foundation models can be expected, reports of Sig2Txt engine in large scale can be utilised in years to come to develop foundational models for a unified purpose. In this work, we propose a system (SignalGPT or BioSignal Copilot) that reduces medical signals to a freestyle or formatted clinical, technical report close to a brief clinical report capturing key features and characterisation of input signal. In its ideal form, this system provides the tool necessary to produce the technical input sequence necessary for LLMs as a step toward using AI in the medical and clinical domains as an assistant to clinicians and patients. To the best of our knowledge, this is the first system for bioSig2Txt generation, and the idea can be used in other domains as well to produce technical reports to harness the power of LLMs. This method also improves the interpretability and tracking (history) of information into and out of the AI models. We did implement this aspect through a buffer in our system. As a preliminary step, we verify the feasibility of the BioSignal Copilot (SignalGPT) using a clinical ECG dataset to demonstrate the advantages of the proposed system. In this feasibility study, we used prompts and fine-tuning to prevent fluctuations in response. The combination of biosignal processing and natural language processing offers a promising solution that improves the interpretability of the results obtained from AI, which also leverages the rapid growth of LLMs.

7
Detecting Cerebral Ischemia from Electroencephalography During Carotid Endarterectomy Using Machine Learning

Mina, A. I.; Espino, J. U.; Bradley, A. M.; Thirumala, P. D.; Batmanghelich, K.; Visweswaran, S.

2023-10-05 health informatics 10.1101/2023.10.04.23295638
#1
86× avg
Show abstract

Intraoperative stroke is a major concern during high-risk surgical procedures such as carotid endarterectomy (CEA). Ischemia, a stroke precursor, can be detected using continuous electroencephalographic (cEEG) monitoring of electrical changes caused by changes in cerebral blood flow. However, monitoring by experts is currently resource-intensive and prone to error. We investigated if supervised machine learning (ML) could detect ischemia accurately using intraoperative cEEG. Using cEEG recordings from 802 patients, we trained six ML models, including naive Bayes, logistic regression, support vector classifier, random forest (RF), light gradient-boosting machine (LGBM), and eXtreme Gradient Boosting with random forest (XGBoost RF), and tested them on a validation dataset of 30 patients. Each cEEG recording in the validation dataset was labeled independently by five expert neurophysiologists who regularly perform intraoperative neuromonitoring. We did not derive consensus labels but rather evaluated an ML model in a pairwise fashion using one expert as a reference at a time, due to the experts variability in label determination, which is typical for clinical tasks. The tree-based ML models, including RF, LGBM, and XGBoost RF, performed best, with AUROC values ranging from 0.92 to 0.93 and AUPRC values ranging from 0.79 to 0.83. Our findings suggest that ML models can serve as the foundation for a real-time intraoperative monitoring system that can assist the neurophysiologist in monitoring patients.

8
Detecting heterogeneous seizures in newborn infants using triple correlation

Smith, G. A.; Henry, J.; van Drongelen, W.

2023-06-16 pediatrics 10.1101/2023.06.09.23291216
#1
81× avg
Show abstract

We detect seizures in newborn infants using a novel method derived from triple correlation, which integrates spatial and temporal structure in neonatal electroencephalograms (EEGs). Triple correlation natively encompasses analogues to a variety of lower-order approaches (auto-correlation, cross-correlation) in addition to introducing higher-order signals, so we hypothesized that our approach would both effectively detect and differentiate notoriously difficult-to-detect and heterogeneous neonatal seizures. Indeed, our method in its simplest form performs comparably well to a current standard of care, amplitude-integrated EEG (aEEG), and by some measures outperforms aEEG, suggesting at a minimum that a combination of triple correlation and aEEG could produce a more effective first-line bedside detector. Moreover, we find that the triple correlation seizure-signal varies between patients, with 1) differences in dominance of either within or between channel correlations and 2) differing levels of higher order structure. We hope that our approach will provide a fertile field for future work in distinguishing and detecting seizures.

9
Investigating automatic speech emotion recognition for children with autism spectrum disorder in interactive intervention sessions with the social robot Kaspar

Milling, M.; Bartl-Pokorny, K. D.; Schuller, B. W.

2022-03-03 health informatics 10.1101/2022.02.24.22271443
#1
80× avg
Show abstract

In this contribution, we present the analyses of vocalisation data recorded in the first observation round of the European Commissions Erasmus Plus project "EMBOA, Affective loop in Socially Assistive Robotics as an intervention tool for children with autism". In total, the project partners recorded data in 112 robot-supported intervention sessions for children with autism spectrum disorder. Audio data were recorded using the internal and lapel microphone of the H4n Pro Recorder. To analyse the data, we first utilise a child voice activity detection (VAD) system in order to extract child vocalisations from the raw audio data. For each child, session, and microphone, we provide the total time child vocalisations were detected. Next, we compare the results of two different implementations for valence- and arousal-based speech emotion recognition, thereby processing (1) the child vocalisations detected by the VAD and (2) the total recorded audio material. We provide average valence and arousal values for each session and condition. Finally, we discuss challenges and limitations of child voice detection and audio-based emotion recognition in robot-supported intervention settings.

10
Features importance in seizure classification using scalp EEG reduced to single timeseries

Naze, S.; Tang, J.; Kozloski, J.; Harrer, S.

2021-07-31 health informatics 10.1101/2021.07.28.21261310
#1
79× avg
Show abstract

Seizure detection and seizure-type classification are best performed using intra-cranial or full-scalp electroencephalogram (EEG). In embedded wearable systems however, recordings from only a few electrodes are available, reducing the spatial resolution of the signals to a handful of timeseries at most. Taking this constraint into account, we tested the performance of multiple classifiers using a subset of the EEG recordings by selecting a single trace from the montage or performing a dimensionality reduction over each hemispherical space. Our results support that Random Forest (RF) classifiers lead most ef-ficient and stable classification performances over Support Vector Machines (SVM). Interestingly, tracking the feature importances using permutation tests reveals that classical EEG spectrum power bands display different rankings across the classifiers: low frequencies (delta, theta) are most important for SVMs while higher frequencies (alpha, gamma) are more relevant for RF and Decision Trees. We reach up to 94.3% {mp} 5.3% accuracy in classifying absence from tonic-clonic seizures using state-of-art sampling methods for unbalanced datasets and leave-patients-out fold cross-validation policy.

11
An Effective Automated Algorithm to Isolate Patient Speech from Conversations with Clinicians

Jaquenoud, T.; Keene, S.; Shlayan, N.; Federman, A.; Pandey, G.

2022-11-30 health informatics 10.1101/2022.11.29.22282914
#1
77× avg
Show abstract

A growing number of algorithms are being developed to automatically identify disorders or disease biomarkers from digitally recorded audio of patient speech. An important step in these analyses is to identify and isolate the patients speech from that of other speakers or noise that are captured in a recording. However, current algorithms, such as diarization, only label the identified speech segments in terms of non-specific speakers, and do not identify the specific speaker of each segment, e.g., clinician and patient. In this paper, we present a novel algorithm that not only performs diarization on clinical audio, but also identifies the patient among the speakers in the recording and returns an audio file containing only the patients speech. Our algorithm first uses pretrained diarization algorithms to separate the input audio into different tracks according to nonspecific speaker labels. Next, in a novel step not conducted in other diarization tools, the algorithm uses the average loudness (quantified as power) of each audio track to identify the patient, and return the audio track containing only their speech. Using a practical expert-based evaluation methodology and a large dataset of clinical audio recordings, we found that the best implementation of our algorithm achieved near-perfect accuracy on two validation sets. Thus, our algorithm can be used for effectively identifying and isolating patient speech, which can be used in downstream expert and/or data-driven analyses.

12
A Novel Hybrid Classical- Quantum Network to Detect Epileptic Seizures

Sameer, M.; Gupta, B.

2022-05-19 health informatics 10.1101/2022.05.18.22275295
#1
77× avg
Show abstract

BackgroundMachine learning (ML) has paved the way for scientists to develop effective computer-aided diagnostic (CAD) systems. In recent years, epileptic seizure detection using Electroencephalogram (EEG) data and deep learning models has gained much attention. However, in deep learning networks, the bottleneck is a large number of learnable parameters. MethodIn this study, a novel approach comprising a 1D-Convolutional Neural Network (CNN) model for feature extraction followed by classical-quantum hybrid layers for classification purpose has been proposed. The proposed technique has only 745 learning parameters, which is the least reported to date. ResultThe proposed method has achieved a maximum accuracy, sensitivity, and specificity of 100% for binary classification on the Bonn EEG dataset. In addition, the noise robustness of the proposed model has also been checked. To the best of the authors knowledge, this is the first study to employ quantum machine learning (QML) to detect epileptic seizures. ConclusionThus, the developed hybrid system will help neurologists to detect seizures in online mode.

13
Towards Automated Neonatal EEG Analysis: Multi-Center Validation of a Reliable Deep Learning Pipeline

Hermans, T.; Dereymaeker, A.; Lemmens, K.; Jansen, K.; Usman, F.; Robinson, S.; Naulaers, G.; De Vos, M.; Hartley, C.

2025-10-17 pediatrics 10.1101/2025.10.16.25338113
#1
76× avg
Show abstract

ObjectiveTo evaluate the reliability and generalization of NeoNaid, a fully automated software tool for neonatal EEG analysis, based on functional brain age (FBA) estimation and sleep staging. MethodsNeoNaid combines a multi-task deep learning model with proposed quality control routines detecting artefacts, out-of-distribution inputs, and uncertain predictions. Based on a raw EEG input, it outputs one global FBA estimate and a continuous 2-state hypnogram. We validated performance on an two independent hospital settings: an internal dataset (33 EEGs, 17 infants, median 900 minutes/recording) and an external dataset (38 EEGs, 24 infants, median 124 minutes/recording). ResultsQuality control rejected comparable number of segments in the internal and external datasets, reducing extreme errors in FBA estimation, and modestly improving sleep staging accuracy. Across the internal and external data, NeoNaid achieved median absolute FBA errors of 0.50 and 0.55 weeks and Cohens Kappa values of 0.89 and 0.87 for quiet sleep detection, respectively. ConclusionsNeoNaid demonstrated improved reliability through integrated quality control and robust generalization across recording setups. SignificanceBy focusing on validation and trustworthiness, this work takes an essential step toward clinical adoption of automated neonatal EEG analysis and supports its utility for both NICU practice and large-scale research.

14
Predicting epileptic seizures using nonnegative matrix factorization

Olivera Stojanovic; Gordon Pipa

2019-06-25 health informatics 10.1101/19000430
#1
76× avg
Show abstract

This paper presents a procedure for the patient-specific prediction of epileptic seizures. To this end, a combination of nonnegative matrix factorization (NMF) and smooth basis functions with robust regression is applied to power spectra of intracranial electroencephalographic (iEEG) signals. The resulting time and frequency components capture the dominant information from power spectra, while removing outliers and noise. This makes it possible to detect structure in preictal states, which is used for classification. Linear support vector machines (SVM) with L1 regularization are used to select and weigh the contributions from different number of not equally informative channels among patients. Due to class imbalance in data, synthetic minority over-sampling technique (SMOTE) is applied. The resulting method yields a computationally and conceptually simple, interpretable model of EEG signals of preictal and interictal states, which shows a good performance for the task of seizure prediction.

15
Highlighting the importance of semantics in COVID-19 fake news detection using a CNN hashmap color-based method

El Azhari, M.

2022-07-25 health informatics 10.1101/2022.07.24.22277975
#1
76× avg
Show abstract

With the exponential development and exploitation of social media sites and platforms such as facebook, twitter and instagram, a diversity type of news are reached to the users,resulting in a major influence on human health and safety.Spreading misinformation and disinformation during the Covid-19 pandemic has become increasingly significant. Although it is usually not a criminal act, it can cause serious endangerment to public health. Such infodemic movement is often lead to advance geopolitical interests by the states, to achieve some sort of profit by some opportunists and individuals or discredit official sources. Hence,it has become crucial to automate the detection of fake news in order to shield people from any harmful repercussions. In this paper, the importance of semantics in Covid-19 fake news detection is highlighted based on a convolutional neural network classifier and a hashmap color-based technique. The experiments are performed with CoAID(Covid-19 heAlthcare mIsinformation Dataset),and the results prove that the loss of semantics yields to a poor performance of the classifier. This implicates additional constraints to the training images,with focus on creating a CNN-based color hashmap classifier that includes anterior and posterior neighbors.

16
Dementia Prediction in Older People through Topic-cued Spontaneous Conversation

Rutkowski, T. M.; Abe, M. S.; Tokunaga, S.; Otake-Matsuura, M.

2021-05-19 health informatics 10.1101/2021.05.18.21257366
#1
76× avg
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWAn increase in dementia cases is producing significant medical and economic pressure in many communities. This growing problem calls for the application of AI-based technologies to support early diagnostics, and for subsequent non-pharmacological cognitive interventions and mental well-being monitoring. We present a practical application of a machine learning (ML) model in the domain known as AI for social good. In particular, we focus on early dementia onset prediction from speech patterns in natural conversation situations. This paper explains our model and study results of conversational speech pattern-based prognostication of mild dementia onset indicated by predictive Mini-Mental State Exam (MMSE) scores. Experiments with elderly subjects are conducted in natural conversation situations, with four members in each study group. We analyze the resulting four-party conversation speech transcripts within a natural language processing (NLP) deep learning framework to obtain conversation embedding. With a fully connected deep learning model, we use the conversation topic changing distances for subsequent MMSE score prediction. This pilot study is conducted with Japanese elderly subjects within a healthy group. The best median MMSE prediction errors are at the level of 0.167, with a median coefficient of determination equal to 0.330 and a mean absolute error of 0.909. The results presented are easily reproducible for other languages by swapping the language model in the proposed deep-learning conversation embedding approach.

17
Model-Based Assessment of Photoplethysmogram Signal Quality in Real-Life Environments

Su, Y.-W.; Hao, C.-C.; Liu, G.-R.; Sheu, Y.-C.; Wu, H.-T.

2024-06-09 health informatics 10.1101/2024.06.07.24308621
#1
76× avg
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWAssessing signal quality is crucial for photoplethysmogram analysis, yet a precise mathematical model for defining signal quality is often lacking, posing challenges in the quantitative analysis. To tackle this problem, we propose a Signal Quality Index (SQI) based on the adaptive non-harmonic model (ANHM) and a Signal Quality Assessment (SQA) model, which is trained using the boosting learning algorithm. The effectiveness of the proposed SQA model is tested on publicly available databases with experts annotations. Result: The DaLiA database [20] is used to train the SQA model, which achieves favorable accuracy and macro-F1 scores in other public databases (accuracy 0.83, 0.76 and 0.87 and macro-F1 0.81, 0.75 and 0.87 for DaLiA-testing dataset, TROIKA dataset [31], and WESAD dataset [23], respectively). This preliminary result shows that the ANHM model and the model-based SQI have potential for establishing an interpretable SQA system.

18
A Novel Explainable AI Method to Assess Associations between Temporal Patterns in Patient Trajectories and Adverse Outcome Risks: Analyzing Fitness as a Risk Factor of ADRD

Shao, Y.; Zamrini, E. Y.; Ahmed, A.; Cheng, Y.; Nelson, S. J.; Kokkinos, P.; Zeng-Treitler, Q.

2024-05-17 health informatics 10.1101/2024.05.17.24307541
#1
76× avg
Show abstract

We present a novel explainable artificial intelligence (XAI) method to assess the associations between the temporal patterns in the patient trajectories recorded in longitudinal clinical data and the adverse outcome risks, through explanations for a type of deep neural network model called Hybrid Value-Aware Transformer (HVAT) model. The HVAT models can learn jointly from longitudinal and non-longitudinal clinical data, and in particular can leverage the time-varying numerical values associated with the clinical codes or concepts within the longitudinal data for outcome prediction. The key component of the XAI method is the definitions of two derived variables, the temporal mean and the temporal slope, which are defined for the clinical concepts with associated time-varying numerical values. The two variables represent the overall level and the rate of change over time, respectively, in the trajectory formed by the values associated with the clinical concept. Two operations on the original values are designed for changing the values of the two derived variables separately. The effects of the two variables on the outcome risks learned by the HVAT model are calculated in terms of impact scores and impacts. Interpretations of the impact scores and impacts as being similar to those of odds ratios are also provided. We applied the XAI method to the study of cardiorespiratory fitness (CRF) as a risk factor of Alzheimers disease and related dementias (ADRD). Using a retrospective case-control study design, we found that each one-unit increase in the overall CRF level is associated with a 5% reduction in ADRD risk, while each one-unit increase in the changing rate of CRF over time is associated with a 1% reduction. A closer investigation revealed that the association between the changing rate of CRF level and the ADRD risk is nonlinear, or more specifically, approximately piecewise linear along the axis of the changing rate on two pieces: the piece of negative changing rates and the piece of positive changing rates.

19
Interictal Epileptiform Discharge Detection Using Probabilistic Diffusion Models and AUPRC Maximization

Kuhlmann, L.; ZHANG, L.; Nhu, D.; Zhao, Y.; Kwan, P.; Ge, Z.; Foster, E.; Millist, L.; Lan, D.

2025-05-23 health informatics 10.1101/2025.05.23.25328198
#1
76× avg
Show abstract

Recently, automated Interictal Epileptiform Discharge (IED) detection has attracted significant attention as a challenging predictive data analysis task aimed at improving early epilepsy diagnosis. Automated IED detection simplifies visual inspection and assists clinicians in identifying crucial IED waveform patterns in electroencephalographic (EEG) brain activities. However, IEDs are vastly outnumbered by non-IED or background data, directly training on such data leads to a detrimental impact on model performance. Moreover, most existing methods lack high precision when tested on cross-institution datasets and this will lead to time wasted by the neurologist having to look at predicted IEDs that are not IEDs. To address these issues, we propose a novel approach that employs probabilistic diffusion models for data augmentation alongside AUPRC maximization methods. This approach effectively addresses data scarcity and imbalance by combining real and synthesized IEDs to fully balance the training dataset, resulting in a 4.5% increase in precision and a 0.7% improvement in the F1-score during within-data evaluation. Additionally, it proves to be robust and generalizable, as evidenced by a 40.04% and 18.74% enhancement in precision and F1-score when applied to cross-data evaluation using data from another hospital.

20
Detecting Alzheimer's Disease in EEG Data with Machine Learning and the Graph Discrete Fourier Transform

Mootoo, X. S.; Fours, A.; Dinesh, C.; Ashkani, M.; Kiss, A.; Faltyn, M.

2023-11-02 health informatics 10.1101/2023.11.01.23297940
#1
76× avg
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWAlzheimer Disease (AD) poses a significant and growing public health challenge worldwide. Early and accurate diagnosis is crucial for effective intervention and care. In recent years, there has been a surge of interest in leveraging Electroen-cephalography (EEG) to improve the detection of AD. This paper focuses on the application of Graph Signal Processing (GSP) techniques using the Graph Discrete Fourier Transform (GDFT) to analyze EEG recordings for the detection of AD, by employing several machine learning (ML) and deep learning (DL) models. We evaluate our models on publicly available EEG data containing 88 patients categorized into three groups: AD, Frontotemporal Dementia (FTD), and Healthy Controls (HC). Binary classification of dementia versus HC reached a top accuracy of 85% (SVM), while multiclass classification of AD, FTD, and HC attained a top accuracy of 44% (Naive Bayes). We provide novel GSP methodology for detecting AD, and form a framework for further experimentation to investigate GSP in the context of other neurodegenerative diseases across multiple data modalities, such as neuroimaging data in Major Depressive Disorder, Epilepsy, and Parkinson disease.